Search CORE

21 research outputs found

Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation

Author: Atsuhiko Kai
Longbiao Wang
Zhaofeng Zhang
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Springer - Publisher Connector

Dereverberation Based on Spectral Subtraction by Multi-channel LMS Algorithm for Hands-free Speech Recognition

Author: Atsuhiko Kai
Kyohei Odani
Longbiao Wang
Norihide Kitaoka
Seiichi Nakagawa
Publication venue: 'IntechOpen'
Publication date: 28/11/2012
Field of study

IntechOpen

Effect of Laughter Yoga on Pulmonary Rehabilitation in Patients with Chronic Obstructive Pulmonary Disease.

Author: Ariyama Yutaka
Fukuoka Atsuhiko
Iwai Kazunori
Kai Yoshiro
Kimura Hiroshi
Kunimatsu Mikikazu
Tomoda Koichi
Ueda Masahide
Uyama Hiroki
Yoshikawa Masanori
Publication venue: 奈良県立医科大学
Publication date: 30/06/2016
Field of study

Objective: To evaluate the clinical usefulness of laughter yoga for patients with chronic obstructive pulmonary disease (COPD) in a pulmonary rehabilitation setting. Design: Pilot study, with randomization of participants. Setting: This study was conducted by the Yoshino-cho National Health Insurance Yoshino Hospital Department of Internal Medicine. Participants: Stable outpatients with COPD (7 men and 1 woman, age 64 to 84 years) participated in the pulmonary rehabilitation program during a 2-week period. Intervention : The patients were divided into two groups based on a sealed envelope randomization method. The laughter yoga group had a 10-min laughter yoga session before exercise training. Patients in both groups had exercise training, educational programs, lung physiotherapy, and nutrition counseling. Outcome Measures: Health-related quality of life using the St. George's Respiratory Questionnaire (SGRQ) and the Medical Research Council (MRC) Health Survey Short Form 36-item (SF-36), depression scores using the Self-rating Depression Scale (SDS), anxiety scores using State-Trait Anxiety Inventory (STAI), and spirometry, the 6-minute walk test and mMRC dyspnea scale results were evaluated before and at 2 weeks after the program in both groups. Results: There were significant improvements in the SGRQ impacts domain and the SF-36 general health domain in the laughter yoga group, while the SF-36 physical functioning domain significantly improved in the control group. SDS and STAI result did not significantly change in either group. Spirometry, the 6-minute walk test, and MRC dyspnea scale results did not significantly change in either group. Conclusion: Laughter yoga may improve the psychological quality of life in patients with COPD

Global Institutional repository of Nara Medical University

Open-source Software for Developing Anthropomorphic Spoken Dialog Agents

Author: Den Yasuharu
Hirose Keikichi
Itou Katsunobu
Kai Atsuhiko
Kobayashi Takao
Lee Akinobu
Minematsu Nobuaki
Morishima Shigeo
Nakamura Satoshi
Nishimoto Takuya
Nitta Tsuneo
Sagayama Shigeki
Shimodaira Hiroshi
Shin-ichi Kawamoto
Tokuda Keiichi
Utsuro Takehito
Yamada Atsushi
Yamashita Yoichi
Yotsukura Tatsuo
Publication venue
Publication date: 01/01/2002
Field of study

An architecture for highly-interactive human-like spoken-dialog agent is discussed in this paper. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial-image synthesizer and dialog controller, each module is modeled as a virtual machine that has a simple common interface and is connected to each other through a broker (communication manager). The agent system under development is supported by the IPA and it will be publicly available as a software toolkit this year

Edinburgh Research Archive

Domain Adaptation with Augmented Data by Deep Neural Network Based Method Using Re-Recorded Speech for Automatic Speech Recognition in Real Environment

Author: Atsuhiko Kai
Raufun Nahar
Shogo Miwa
Publication venue: 'MDPI AG'
Publication date: 01/12/2022
Field of study

The most effective automatic speech recognition (ASR) approaches are based on artificial neural networks (ANN). ANNs need to be trained with an adequate amount of matched conditioned data. Therefore, performing training adaptation of an ASR model using augmented data of matched condition as the real environment gives better results for real data. Real-world speech recordings can vary in different acoustic aspects depending on the recording channels and environment such as the Long Term Evolution (LTE) channel of mobile telephones, where data are transmitted with voice over LTE (VoLTE) technology, wireless pin mics in a classroom condition, etc. Acquiring data with such variation is costly. Therefore, we propose training ASR models with simulated augmented data and fine-tune them for domain adaptation using deep neural network (DNN)-based simulated data along with re-recorded data. DNN-based feature transformation creates realistic speech features from recordings of clean conditions. In this research, a comparative investigation is performed for different recording channel adaptation methods for real-world speech recognition. The proposed method yields 27.0% character error rate reduction (CERR) for the DNN–hidden Markov model (DNN-HMM) hybrid ASR approach and 36.4% CERR for the end-to-end ASR approach for the target domain of the LTE channel of telephone speech

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

Blind dereverberation based on spectral subtraction by multi-channel LMS algorithm for distant-talking speech recognition,” LangTech 2008

Author: Atsuhiko Kai
Kyohei Odani
Longbiao Wang
Norihide Kitaoka
Seiichi Nakagawa
Publication venue
Publication date: 01/01/2008
Field of study

CiteSeerX

Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording

Author: Atsuhiko Kai
Bo Ren
Longbiao Wang
Shunta Teraoka
Taku Fukushima
Yuma Ueda
Publication venue
Publication date: 01/05/2020
Field of study

Abstract-In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining denoising autoencoder-based cepstral-domain dereverberation, automatic asynchronous speech (microphone or mobile terminal) selection and environment adaptation. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAM0 corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models

CiteSeerX

Kana-Kanji Conversion by Using Unknown Word-Pronunciation Pairs with Contexts

Author: KAI ATSUHIKO
MORI SHINSUKE
MORI SHINSUKE
NAGANO TOORU
NAGATA MASAAKI
NAGATA MASAAKI
Publication venue: 'Association for Natural Language Processing'
Publication date: 01/01/2010
Field of study

Crossref

2

Author: Akinobu Lee
Atsuhiko Kai
Hiroshi Shimodaira
Katsunobu Itou
Satoshi Nakamura
Shigeo Morishima
Shin-ichi Kawamoto
Tatsuo Yotsukura
Tsuneo Nitta
Yoichi Yamashita
Publication venue
Publication date
Field of study

Summary. Galatea is a software toolkit to develop a human-like spoken dialog agent. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial animation synthesizer [ facial-image synthesizer] and dialog controller, each module is modeled as a virtual machine having a simple common interface and connected to each other through a broker (communication manager). Galatea employs model-based speech and facial animation[ facial-image] synthesizers whose model parameters are adapted easily to those for an existing person if his/her training data is given. The software toolkit that runs on both UNIX/Linux and Windows operating systems will be publicly available in the middle of 2003 [1, 2].

CiteSeerX